A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# Installing the libraries with the specified version.
!pip install pandas==1.5.3 numpy==1.25.2 matplotlib==3.7.1 seaborn==0.13.1 scikit-learn==1.2.2 statsmodels==0.14.1 -q --user
Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.
#Libraries to help with reading and manipulating data
import numpy as np
import pandas as pd
#Libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
#Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
#Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
#Setting the precision of floating numbers to 5 decimal points
pd.set_option("display.float_format", lambda x: "%.5f" % x)
#Library to split data
from sklearn.model_selection import train_test_split
#To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
#To tune different models
from sklearn.model_selection import GridSearchCV
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
precision_recall_curve,
roc_curve,
make_scorer,
)
#import csv into dataframe
hotel = pd.read_csv("INNHotelsGroup.csv")
# copying data to another varaible to avoid any changes to original data
df = hotel.copy()
#Show first five rows of data
df.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00000 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68000 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00000 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00000 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50000 | 0 | Canceled |
#Last 5 rows of the dataset
df.tail()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36270 | INN36271 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80000 | 1 | Not_Canceled |
| 36271 | INN36272 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95000 | 2 | Canceled |
| 36272 | INN36273 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39000 | 2 | Not_Canceled |
| 36273 | INN36274 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50000 | 0 | Canceled |
| 36274 | INN36275 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67000 | 0 | Not_Canceled |
df.shape
(36275, 19)
#show a random sample of five rows of data
df.sample(n=5)
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9231 | INN09232 | 2 | 0 | 2 | 2 | Not Selected | 0 | Room_Type 1 | 208 | 2018 | 7 | 30 | Online | 0 | 0 | 0 | 80.75000 | 0 | Canceled |
| 20113 | INN20114 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 48 | 2018 | 11 | 4 | Online | 0 | 0 | 0 | 97.20000 | 2 | Not_Canceled |
| 23076 | INN23077 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 120 | 2018 | 8 | 8 | Online | 0 | 0 | 0 | 127.80000 | 0 | Canceled |
| 14762 | INN14763 | 3 | 0 | 1 | 2 | Meal Plan 2 | 0 | Room_Type 4 | 59 | 2018 | 4 | 22 | Online | 0 | 0 | 0 | 189.00000 | 3 | Not_Canceled |
| 28058 | INN28059 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 151 | 2018 | 1 | 19 | Offline | 0 | 0 | 0 | 71.00000 | 0 | Not_Canceled |
The dataframe has 36275 rows and 19 columns.
# drop the 'Booking ID' columns from the data set.
df = df.drop('Booking_ID', axis=1)
# view what are the values in object data types
cat_columns = ['type_of_meal_plan', 'room_type_reserved', 'market_segment_type', 'booking_status']
for i in cat_columns:
print(df[i].value_counts())
print("*" * 50)
Meal Plan 1 27835 Not Selected 5130 Meal Plan 2 3305 Meal Plan 3 5 Name: type_of_meal_plan, dtype: int64 ************************************************** Room_Type 1 28130 Room_Type 4 6057 Room_Type 6 966 Room_Type 2 692 Room_Type 5 265 Room_Type 7 158 Room_Type 3 7 Name: room_type_reserved, dtype: int64 ************************************************** Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: market_segment_type, dtype: int64 ************************************************** Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64 **************************************************
df.describe()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 | 36275.00000 |
| mean | 1.84496 | 0.10528 | 0.81072 | 2.20430 | 0.03099 | 85.23256 | 2017.82043 | 7.42365 | 15.59700 | 0.02564 | 0.02335 | 0.15341 | 103.42354 | 0.61966 |
| std | 0.51871 | 0.40265 | 0.87064 | 1.41090 | 0.17328 | 85.93082 | 0.38384 | 3.06989 | 8.74045 | 0.15805 | 0.36833 | 1.75417 | 35.08942 | 0.78624 |
| min | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 2017.00000 | 1.00000 | 1.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| 25% | 2.00000 | 0.00000 | 0.00000 | 1.00000 | 0.00000 | 17.00000 | 2018.00000 | 5.00000 | 8.00000 | 0.00000 | 0.00000 | 0.00000 | 80.30000 | 0.00000 |
| 50% | 2.00000 | 0.00000 | 1.00000 | 2.00000 | 0.00000 | 57.00000 | 2018.00000 | 8.00000 | 16.00000 | 0.00000 | 0.00000 | 0.00000 | 99.45000 | 0.00000 |
| 75% | 2.00000 | 0.00000 | 2.00000 | 3.00000 | 0.00000 | 126.00000 | 2018.00000 | 10.00000 | 23.00000 | 0.00000 | 0.00000 | 0.00000 | 120.00000 | 1.00000 |
| max | 4.00000 | 10.00000 | 7.00000 | 17.00000 | 1.00000 | 443.00000 | 2018.00000 | 12.00000 | 31.00000 | 1.00000 | 13.00000 | 58.00000 | 540.00000 | 5.00000 |
df.describe(include = ['object'])
| type_of_meal_plan | room_type_reserved | market_segment_type | booking_status | |
|---|---|---|---|---|
| count | 36275 | 36275 | 36275 | 36275 |
| unique | 4 | 7 | 5 | 2 |
| top | Meal Plan 1 | Room_Type 1 | Online | Not_Canceled |
| freq | 27835 | 28130 | 23214 | 24390 |
# function to create labeled barplots
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
def stacked_barplot(data, predictor, target, perc=False):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5,))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
labeled_barplot(df, 'arrival_month', perc=True, n=None)
labeled_barplot(df, 'market_segment_type', perc=True, n=None)
histogram_boxplot(df, 'avg_price_per_room')
# how many free rooms does the hotel give away?
df[df['avg_price_per_room']==0]
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 63 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2017 | 9 | 10 | Complementary | 0 | 0 | 0 | 0.00000 | 1 | Not_Canceled |
| 145 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 13 | 2018 | 6 | 1 | Complementary | 1 | 3 | 5 | 0.00000 | 1 | Not_Canceled |
| 209 | 1 | 0 | 0 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2018 | 2 | 27 | Complementary | 0 | 0 | 0 | 0.00000 | 1 | Not_Canceled |
| 266 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2017 | 8 | 12 | Complementary | 1 | 0 | 1 | 0.00000 | 1 | Not_Canceled |
| 267 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2017 | 8 | 23 | Complementary | 0 | 0 | 0 | 0.00000 | 1 | Not_Canceled |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 35983 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 7 | 0 | 2018 | 6 | 7 | Complementary | 1 | 4 | 17 | 0.00000 | 1 | Not_Canceled |
| 36080 | 1 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 7 | 0 | 2018 | 3 | 21 | Complementary | 1 | 3 | 15 | 0.00000 | 1 | Not_Canceled |
| 36114 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 3 | 2 | Online | 0 | 0 | 0 | 0.00000 | 0 | Not_Canceled |
| 36217 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 2 | 3 | 2017 | 8 | 9 | Online | 0 | 0 | 0 | 0.00000 | 2 | Not_Canceled |
| 36250 | 1 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 6 | 2017 | 12 | 10 | Online | 0 | 0 | 0 | 0.00000 | 0 | Not_Canceled |
545 rows × 18 columns
df.loc[df['avg_price_per_room']==0, 'market_segment_type'].value_counts()
Complementary 354 Online 191 Name: market_segment_type, dtype: int64
labeled_barplot(df, 'booking_status', perc=True, n=None)
sns.boxplot(data=df, y='avg_price_per_room' , x='market_segment_type')
<Axes: xlabel='market_segment_type', ylabel='avg_price_per_room'>
stacked_barplot(df,'repeated_guest','booking_status')
booking_status Canceled Not_Canceled All repeated_guest All 11885 24390 36275 0 11869 23476 35345 1 16 914 930 ------------------------------------------------------------------------------------------------------------------------
sns.catplot(data=df, y='no_of_special_requests', hue='booking_status', kind='count' )
<seaborn.axisgrid.FacetGrid at 0x7fb82b53c6d0>
Leading Questions:
#group by *arrival month*, count number of records per month, sort from most to fewest bookings, and show top 4 months
df.groupby('arrival_month').count().sort_values(by='booking_status', ascending=False)['booking_status'].head(4)
arrival_month 10 5317 9 4611 8 3813 6 3203 Name: booking_status, dtype: int64
df.groupby('market_segment_type').count().sort_values(by='booking_status', ascending=False)['booking_status']
market_segment_type Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: booking_status, dtype: int64
df.groupby('market_segment_type').agg({'avg_price_per_room':'mean'}).sort_values(by='avg_price_per_room',ascending=False)
| avg_price_per_room | |
|---|---|
| market_segment_type | |
| Online | 112.25685 |
| Aviation | 100.70400 |
| Offline | 91.63268 |
| Corporate | 82.91174 |
| Complementary | 3.14176 |
df['booking_status'].value_counts()
Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64
df.groupby('repeated_guest')['booking_status'].value_counts()
repeated_guest booking_status
0 Not_Canceled 23476
Canceled 11869
1 Not_Canceled 914
Canceled 16
Name: booking_status, dtype: int64
sns.countplot(data=df, hue='booking_status', x='no_of_special_requests')
plt.legend(loc='upper right')
plt.show()
labeled_barplot(df, 'no_of_weekend_nights', perc=True)
labeled_barplot(df, 'no_of_adults', perc=True)
labeled_barplot(df, 'no_of_children', perc=True)
labeled_barplot(df, 'required_car_parking_space', perc=True)
labeled_barplot(df, 'no_of_week_nights', perc=True)
labeled_barplot(df, 'type_of_meal_plan', perc=True)
labeled_barplot(df, 'room_type_reserved', perc=True)
histogram_boxplot(df, 'no_of_previous_cancellations')
labeled_barplot(df, 'lead_time', perc=True)
plt.figure(figsize=(20,10))
sns.heatmap(
df.corr(), annot=True, vmin=-1, vmax=1, fmt='.2f')
<Axes: >
# New column for lenght of stay
df['length_stay'] = df['no_of_weekend_nights'] + df['no_of_week_nights']
sns.pairplot(df[['no_of_weekend_nights','no_of_week_nights','required_car_parking_space',
'lead_time','avg_price_per_room','no_of_special_requests','type_of_meal_plan',
'room_type_reserved','market_segment_type','booking_status','length_stay']]);
df.loc[df['booking_status']=='Not_Canceled','booking_status'] = False
df.loc[df['booking_status']=='Canceled','booking_status'] = True
numeric_columns = df.select_dtypes(include=np.number).columns.tolist()
# drop column because they were either time, or not helpful
numeric_columns.remove("arrival_year")
plt.figure(figsize=(15, 12))
for i, variable in enumerate(numeric_columns):
df.boxplot()
plt.xticks(rotation=45)
plt.show()
There are two heavy outlier columns, lead_time & avg_room_price. I will only treat avg_room_price as a log because I am going to bin lead time and that should handle those outliers.
#Solving the IQR fro avg price room
quartiles = np.quantile(df['avg_price_per_room'][df['avg_price_per_room'].notnull()], [.25, .75])
power_4iqr = 4 * (quartiles[1] - quartiles[0])
print(f'Q1 = {quartiles[0]}, Q3 = {quartiles[1]}, 4*IQR = {power_4iqr}')
outlier_powers = df.loc[np.abs(df['avg_price_per_room'] - df['avg_price_per_room'].median()) > power_4iqr, 'avg_price_per_room']
outlier_powers.shape
Q1 = 80.3, Q3 = 120.0, 4*IQR = 158.8
(49,)
# creating a list of columns
dist_cols = [
item for item in df.select_dtypes(include=np.number).columns
]
plt.figure(figsize=(15, 45))
#looping the list and ploting histograms
for i in range(len(dist_cols)):
plt.subplot(12, 3, i + 1)
plt.hist(df[dist_cols[i]], bins=50)
plt.tight_layout()
plt.title(dist_cols[i], fontsize=15)
plt.show()
df2 = df.copy()
# removing because they are close to normal
dist_cols.remove('no_of_week_nights')
dist_cols.remove('no_of_adults')
dist_cols.remove('length_stay')
dist_cols.remove('avg_price_per_room')
# removing becasue they are boolean or time related.
dist_cols.remove('arrival_year')
dist_cols.remove('required_car_parking_space')
dist_cols.remove('arrival_date')
dist_cols.remove('arrival_month')
dist_cols.remove('repeated_guest')
# removing becasue I have a different treatment in mind
dist_cols.remove('lead_time')
# using log transforms on some columns
for col in dist_cols:
df2[col + "_log"] = np.log(df2[col] + 1)
# dropping the original columns
df2.drop(dist_cols, axis=1, inplace=True)
df2.head()
| no_of_adults | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | avg_price_per_room | booking_status | length_stay | no_of_children_log | no_of_weekend_nights_log | no_of_previous_cancellations_log | no_of_previous_bookings_not_canceled_log | no_of_special_requests_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 65.00000 | False | 3 | 0.00000 | 0.69315 | 0.00000 | 0.00000 | 0.00000 |
| 1 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 106.68000 | False | 5 | 0.00000 | 1.09861 | 0.00000 | 0.00000 | 0.69315 |
| 2 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 60.00000 | True | 3 | 0.00000 | 1.09861 | 0.00000 | 0.00000 | 0.00000 |
| 3 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 100.00000 | True | 2 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| 4 | 2 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 94.50000 | True | 2 | 0.00000 | 0.69315 | 0.00000 | 0.00000 | 0.00000 |
# viewing the distributions after the log transformation.
dist_cols = [
item for item in df2.select_dtypes(include=np.number).columns
]
# plot histogram of all numeric columns
plt.figure(figsize=(15, 45))
for i in range(len(dist_cols)):
plt.subplot(12, 3, i + 1)
plt.hist(df2[dist_cols[i]], bins=50)
sns.histplot(data=df2, x=dist_cols[i], kde=True)
plt.tight_layout()
plt.title(dist_cols[i], fontsize=25)
plt.show()
dummy_data = pd.get_dummies (
df2,
columns = [
'type_of_meal_plan',
'room_type_reserved',
'market_segment_type',
],
drop_first=True,
)
dummy_data.head()
| no_of_adults | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | avg_price_per_room | booking_status | length_stay | no_of_children_log | no_of_weekend_nights_log | no_of_previous_cancellations_log | no_of_previous_bookings_not_canceled_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | 0 | 224 | 2017 | 10 | 2 | 0 | 65.00000 | False | 3 | 0.00000 | 0.69315 | 0.00000 | 0.00000 | 0.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 2 | 3 | 0 | 5 | 2018 | 11 | 6 | 0 | 106.68000 | False | 5 | 0.00000 | 1.09861 | 0.00000 | 0.00000 | 0.69315 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 1 | 1 | 0 | 1 | 2018 | 2 | 28 | 0 | 60.00000 | True | 3 | 0.00000 | 1.09861 | 0.00000 | 0.00000 | 0.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2 | 2 | 0 | 211 | 2018 | 5 | 20 | 0 | 100.00000 | True | 2 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 2 | 1 | 0 | 48 | 2018 | 4 | 11 | 0 | 94.50000 | True | 2 | 0.00000 | 0.69315 | 0.00000 | 0.00000 | 0.00000 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
dummy_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_week_nights 36275 non-null int64 2 required_car_parking_space 36275 non-null int64 3 lead_time 36275 non-null int64 4 arrival_year 36275 non-null int64 5 arrival_month 36275 non-null int64 6 arrival_date 36275 non-null int64 7 repeated_guest 36275 non-null int64 8 avg_price_per_room 36275 non-null float64 9 booking_status 36275 non-null object 10 length_stay 36275 non-null int64 11 no_of_children_log 36275 non-null float64 12 no_of_weekend_nights_log 36275 non-null float64 13 no_of_previous_cancellations_log 36275 non-null float64 14 no_of_previous_bookings_not_canceled_log 36275 non-null float64 15 no_of_special_requests_log 36275 non-null float64 16 type_of_meal_plan_Meal Plan 2 36275 non-null uint8 17 type_of_meal_plan_Meal Plan 3 36275 non-null uint8 18 type_of_meal_plan_Not Selected 36275 non-null uint8 19 room_type_reserved_Room_Type 2 36275 non-null uint8 20 room_type_reserved_Room_Type 3 36275 non-null uint8 21 room_type_reserved_Room_Type 4 36275 non-null uint8 22 room_type_reserved_Room_Type 5 36275 non-null uint8 23 room_type_reserved_Room_Type 6 36275 non-null uint8 24 room_type_reserved_Room_Type 7 36275 non-null uint8 25 market_segment_type_Complementary 36275 non-null uint8 26 market_segment_type_Corporate 36275 non-null uint8 27 market_segment_type_Offline 36275 non-null uint8 28 market_segment_type_Online 36275 non-null uint8 dtypes: float64(6), int64(9), object(1), uint8(13) memory usage: 4.9+ MB
dummied_cut = pd.cut(dummy_data['lead_time'], 5, labels=['lat_min','short','med','long','advanced'])
dummied_cut.head(10)
0 med 1 lat_min 2 lat_min 3 med 4 lat_min 5 long 6 lat_min 7 lat_min 8 short 9 lat_min Name: lead_time, dtype: category Categories (5, object): ['lat_min' < 'short' < 'med' < 'long' < 'advanced']
df3 = pd.merge(dummy_data, dummied_cut, left_index=True, right_index=True)
df3.head().T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| no_of_adults | 2 | 2 | 1 | 2 | 2 |
| no_of_week_nights | 2 | 3 | 1 | 2 | 1 |
| required_car_parking_space | 0 | 0 | 0 | 0 | 0 |
| lead_time_x | 224 | 5 | 1 | 211 | 48 |
| arrival_year | 2017 | 2018 | 2018 | 2018 | 2018 |
| arrival_month | 10 | 11 | 2 | 5 | 4 |
| arrival_date | 2 | 6 | 28 | 20 | 11 |
| repeated_guest | 0 | 0 | 0 | 0 | 0 |
| avg_price_per_room | 65.00000 | 106.68000 | 60.00000 | 100.00000 | 94.50000 |
| booking_status | False | False | True | True | True |
| length_stay | 3 | 5 | 3 | 2 | 2 |
| no_of_children_log | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_weekend_nights_log | 0.69315 | 1.09861 | 1.09861 | 0.00000 | 0.69315 |
| no_of_previous_cancellations_log | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_previous_bookings_not_canceled_log | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_special_requests_log | 0.00000 | 0.69315 | 0.00000 | 0.00000 | 0.00000 |
| type_of_meal_plan_Meal Plan 2 | 0 | 0 | 0 | 0 | 0 |
| type_of_meal_plan_Meal Plan 3 | 0 | 0 | 0 | 0 | 0 |
| type_of_meal_plan_Not Selected | 0 | 1 | 0 | 0 | 1 |
| room_type_reserved_Room_Type 2 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 3 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 4 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 5 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 6 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 7 | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Complementary | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Corporate | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Offline | 1 | 0 | 0 | 0 | 0 |
| market_segment_type_Online | 0 | 1 | 1 | 1 | 1 |
| lead_time_y | med | lat_min | lat_min | med | lat_min |
# dropping time variables and lead_time_x since it has been binned into 5 columns.
df3_5 = df3.drop(['lead_time_x','arrival_date', 'arrival_year'], axis=1)
df4 = pd.get_dummies (
df3_5,
columns = [
'lead_time_y',
],
drop_first=True,
)
df4.head().T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| no_of_adults | 2 | 2 | 1 | 2 | 2 |
| no_of_week_nights | 2 | 3 | 1 | 2 | 1 |
| required_car_parking_space | 0 | 0 | 0 | 0 | 0 |
| arrival_month | 10 | 11 | 2 | 5 | 4 |
| repeated_guest | 0 | 0 | 0 | 0 | 0 |
| avg_price_per_room | 65.00000 | 106.68000 | 60.00000 | 100.00000 | 94.50000 |
| booking_status | False | False | True | True | True |
| length_stay | 3 | 5 | 3 | 2 | 2 |
| no_of_children_log | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_weekend_nights_log | 0.69315 | 1.09861 | 1.09861 | 0.00000 | 0.69315 |
| no_of_previous_cancellations_log | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_previous_bookings_not_canceled_log | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| no_of_special_requests_log | 0.00000 | 0.69315 | 0.00000 | 0.00000 | 0.00000 |
| type_of_meal_plan_Meal Plan 2 | 0 | 0 | 0 | 0 | 0 |
| type_of_meal_plan_Meal Plan 3 | 0 | 0 | 0 | 0 | 0 |
| type_of_meal_plan_Not Selected | 0 | 1 | 0 | 0 | 1 |
| room_type_reserved_Room_Type 2 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 3 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 4 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 5 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 6 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 7 | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Complementary | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Corporate | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Offline | 1 | 0 | 0 | 0 | 0 |
| market_segment_type_Online | 0 | 1 | 1 | 1 | 1 |
| lead_time_y_short | 0 | 0 | 0 | 0 | 0 |
| lead_time_y_med | 1 | 0 | 0 | 1 | 0 |
| lead_time_y_long | 0 | 0 | 0 | 0 | 0 |
| lead_time_y_advanced | 0 | 0 | 0 | 0 | 0 |
df4 = df4.astype(float)
df4.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null float64 1 no_of_week_nights 36275 non-null float64 2 required_car_parking_space 36275 non-null float64 3 arrival_month 36275 non-null float64 4 repeated_guest 36275 non-null float64 5 avg_price_per_room 36275 non-null float64 6 booking_status 36275 non-null float64 7 length_stay 36275 non-null float64 8 no_of_children_log 36275 non-null float64 9 no_of_weekend_nights_log 36275 non-null float64 10 no_of_previous_cancellations_log 36275 non-null float64 11 no_of_previous_bookings_not_canceled_log 36275 non-null float64 12 no_of_special_requests_log 36275 non-null float64 13 type_of_meal_plan_Meal Plan 2 36275 non-null float64 14 type_of_meal_plan_Meal Plan 3 36275 non-null float64 15 type_of_meal_plan_Not Selected 36275 non-null float64 16 room_type_reserved_Room_Type 2 36275 non-null float64 17 room_type_reserved_Room_Type 3 36275 non-null float64 18 room_type_reserved_Room_Type 4 36275 non-null float64 19 room_type_reserved_Room_Type 5 36275 non-null float64 20 room_type_reserved_Room_Type 6 36275 non-null float64 21 room_type_reserved_Room_Type 7 36275 non-null float64 22 market_segment_type_Complementary 36275 non-null float64 23 market_segment_type_Corporate 36275 non-null float64 24 market_segment_type_Offline 36275 non-null float64 25 market_segment_type_Online 36275 non-null float64 26 lead_time_y_short 36275 non-null float64 27 lead_time_y_med 36275 non-null float64 28 lead_time_y_long 36275 non-null float64 29 lead_time_y_advanced 36275 non-null float64 dtypes: float64(30) memory usage: 8.3 MB
# Using the SCIEM method I will split the train test data first.
X = df4.drop("booking_status" , axis=1)
y = df4.pop("booking_status")
# adding a contstant to X variable
X = add_constant(X)
# Train/Test Split 70/30
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.30, random_state=1)
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 25392 Number of rows in test data = 10883
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: 0.00000 0.67064 1.00000 0.32936 Name: booking_status, dtype: float64 Percentage of classes in test set: 0.00000 0.67638 1.00000 0.32362 Name: booking_status, dtype: float64
X_train.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 25392 entries, 13662 to 33003 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 const 25392 non-null float64 1 no_of_adults 25392 non-null float64 2 no_of_week_nights 25392 non-null float64 3 required_car_parking_space 25392 non-null float64 4 arrival_month 25392 non-null float64 5 repeated_guest 25392 non-null float64 6 avg_price_per_room 25392 non-null float64 7 length_stay 25392 non-null float64 8 no_of_children_log 25392 non-null float64 9 no_of_weekend_nights_log 25392 non-null float64 10 no_of_previous_cancellations_log 25392 non-null float64 11 no_of_previous_bookings_not_canceled_log 25392 non-null float64 12 no_of_special_requests_log 25392 non-null float64 13 type_of_meal_plan_Meal Plan 2 25392 non-null float64 14 type_of_meal_plan_Meal Plan 3 25392 non-null float64 15 type_of_meal_plan_Not Selected 25392 non-null float64 16 room_type_reserved_Room_Type 2 25392 non-null float64 17 room_type_reserved_Room_Type 3 25392 non-null float64 18 room_type_reserved_Room_Type 4 25392 non-null float64 19 room_type_reserved_Room_Type 5 25392 non-null float64 20 room_type_reserved_Room_Type 6 25392 non-null float64 21 room_type_reserved_Room_Type 7 25392 non-null float64 22 market_segment_type_Complementary 25392 non-null float64 23 market_segment_type_Corporate 25392 non-null float64 24 market_segment_type_Offline 25392 non-null float64 25 market_segment_type_Online 25392 non-null float64 26 lead_time_y_short 25392 non-null float64 27 lead_time_y_med 25392 non-null float64 28 lead_time_y_long 25392 non-null float64 29 lead_time_y_advanced 25392 non-null float64 dtypes: float64(30) memory usage: 6.0 MB
# let's check the VIF of the predictors
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 326.14192 no_of_adults 1.34666 no_of_week_nights 100.27746 required_car_parking_space 1.04158 arrival_month 1.05151 repeated_guest 3.34004 avg_price_per_room 1.93604 length_stay 146.44254 no_of_children_log 1.86632 no_of_weekend_nights_log 34.42876 no_of_previous_cancellations_log 1.59714 no_of_previous_bookings_not_canceled_log 3.50891 no_of_special_requests_log 1.26796 type_of_meal_plan_Meal Plan 2 1.21752 type_of_meal_plan_Meal Plan 3 1.02532 type_of_meal_plan_Not Selected 1.23653 room_type_reserved_Room_Type 2 1.09067 room_type_reserved_Room_Type 3 1.00338 room_type_reserved_Room_Type 4 1.36465 room_type_reserved_Room_Type 5 1.02802 room_type_reserved_Room_Type 6 1.85857 room_type_reserved_Room_Type 7 1.11100 market_segment_type_Complementary 4.50724 market_segment_type_Corporate 16.93002 market_segment_type_Offline 64.01666 market_segment_type_Online 71.24829 lead_time_y_short 1.11927 lead_time_y_med 1.10532 lead_time_y_long 1.15163 lead_time_y_advanced 1.04712 dtype: float64
#dropping the number of weekend & week nights because I have combined them into one & market segements because they all have large multi values
X_train1 = X_train.drop(['no_of_weekend_nights_log',
'no_of_week_nights',
'market_segment_type_Online',
'market_segment_type_Offline',
'market_segment_type_Corporate',
'market_segment_type_Complementary'],
axis=1)
logit = sm.Logit(y_train, X_train1.astype(float))
lg = logit.fit()
Optimization terminated successfully.
Current function value: 0.463427
Iterations 10
# Print the logistic regression summary
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25368
Method: MLE Df Model: 23
Date: Sun, 14 Jul 2024 Pseudo R-squ.: 0.2687
Time: 11:01:24 Log-Likelihood: -11767.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
============================================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------------------------------
const -3.6818 0.098 -37.650 0.000 -3.873 -3.490
no_of_adults 0.2321 0.035 6.614 0.000 0.163 0.301
required_car_parking_space -1.4537 0.135 -10.742 0.000 -1.719 -1.188
arrival_month -0.0668 0.006 -11.685 0.000 -0.078 -0.056
repeated_guest -2.6424 0.630 -4.193 0.000 -3.878 -1.407
avg_price_per_room 0.0229 0.001 33.788 0.000 0.022 0.024
length_stay 0.1088 0.009 11.946 0.000 0.091 0.127
no_of_children_log 0.5488 0.093 5.887 0.000 0.366 0.732
no_of_previous_cancellations_log 1.2323 0.490 2.515 0.012 0.272 2.193
no_of_previous_bookings_not_canceled_log -0.6731 0.477 -1.411 0.158 -1.608 0.262
no_of_special_requests_log -1.9180 0.044 -43.892 0.000 -2.004 -1.832
type_of_meal_plan_Meal Plan 2 -0.3480 0.056 -6.165 0.000 -0.459 -0.237
type_of_meal_plan_Meal Plan 3 1.7182 2.912 0.590 0.555 -3.989 7.425
type_of_meal_plan_Not Selected 0.8463 0.048 17.563 0.000 0.752 0.941
room_type_reserved_Room_Type 2 0.1288 0.123 1.045 0.296 -0.113 0.370
room_type_reserved_Room_Type 3 -0.2278 1.194 -0.191 0.849 -2.567 2.111
room_type_reserved_Room_Type 4 0.0548 0.050 1.095 0.273 -0.043 0.153
room_type_reserved_Room_Type 5 -0.9272 0.196 -4.735 0.000 -1.311 -0.543
room_type_reserved_Room_Type 6 -1.0662 0.135 -7.903 0.000 -1.331 -0.802
room_type_reserved_Room_Type 7 -1.8078 0.286 -6.331 0.000 -2.368 -1.248
lead_time_y_short 1.3167 0.039 34.051 0.000 1.241 1.393
lead_time_y_med 2.8622 0.058 49.315 0.000 2.748 2.976
lead_time_y_long 3.0529 0.077 39.428 0.000 2.901 3.205
lead_time_y_advanced 4.5673 0.247 18.478 0.000 4.083 5.052
============================================================================================================
# Checking the VIF of the predictors
vif_series = pd.Series(
[variance_inflation_factor(X_train1.values, i) for i in range(X_train1.shape[1])],
index=X_train1.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 29.38943 no_of_adults 1.27904 required_car_parking_space 1.03734 arrival_month 1.04591 repeated_guest 3.21685 avg_price_per_room 1.58350 length_stay 1.07689 no_of_children_log 1.85549 no_of_previous_cancellations_log 1.57679 no_of_previous_bookings_not_canceled_log 3.44533 no_of_special_requests_log 1.13382 type_of_meal_plan_Meal Plan 2 1.13453 type_of_meal_plan_Meal Plan 3 1.01864 type_of_meal_plan_Not Selected 1.10882 room_type_reserved_Room_Type 2 1.07952 room_type_reserved_Room_Type 3 1.00088 room_type_reserved_Room_Type 4 1.31700 room_type_reserved_Room_Type 5 1.01351 room_type_reserved_Room_Type 6 1.83383 room_type_reserved_Room_Type 7 1.07227 lead_time_y_short 1.10534 lead_time_y_med 1.09193 lead_time_y_long 1.12297 lead_time_y_advanced 1.04374 dtype: float64
# Test performance
pred_train = lg.predict(X_train1) > 0.5
pred_train = np.round(pred_train)
X_train2 = X_train1.drop(['room_type_reserved_Room_Type 3'], axis=1)
X_train2.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 25392 entries, 13662 to 33003 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 const 25392 non-null float64 1 no_of_adults 25392 non-null float64 2 required_car_parking_space 25392 non-null float64 3 arrival_month 25392 non-null float64 4 repeated_guest 25392 non-null float64 5 avg_price_per_room 25392 non-null float64 6 length_stay 25392 non-null float64 7 no_of_children_log 25392 non-null float64 8 no_of_previous_cancellations_log 25392 non-null float64 9 no_of_previous_bookings_not_canceled_log 25392 non-null float64 10 no_of_special_requests_log 25392 non-null float64 11 type_of_meal_plan_Meal Plan 2 25392 non-null float64 12 type_of_meal_plan_Meal Plan 3 25392 non-null float64 13 type_of_meal_plan_Not Selected 25392 non-null float64 14 room_type_reserved_Room_Type 2 25392 non-null float64 15 room_type_reserved_Room_Type 4 25392 non-null float64 16 room_type_reserved_Room_Type 5 25392 non-null float64 17 room_type_reserved_Room_Type 6 25392 non-null float64 18 room_type_reserved_Room_Type 7 25392 non-null float64 19 lead_time_y_short 25392 non-null float64 20 lead_time_y_med 25392 non-null float64 21 lead_time_y_long 25392 non-null float64 22 lead_time_y_advanced 25392 non-null float64 dtypes: float64(23) memory usage: 4.6 MB
logit = sm.Logit(y_train, X_train2.astype(float))
lg2 = logit.fit()
Optimization terminated successfully.
Current function value: 0.463428
Iterations 10
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25369
Method: MLE Df Model: 22
Date: Sun, 14 Jul 2024 Pseudo R-squ.: 0.2687
Time: 11:02:52 Log-Likelihood: -11767.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
============================================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------------------------------
const -3.6818 0.098 -37.651 0.000 -3.873 -3.490
no_of_adults 0.2321 0.035 6.614 0.000 0.163 0.301
required_car_parking_space -1.4536 0.135 -10.742 0.000 -1.719 -1.188
arrival_month -0.0668 0.006 -11.687 0.000 -0.078 -0.056
repeated_guest -2.6423 0.630 -4.193 0.000 -3.878 -1.407
avg_price_per_room 0.0229 0.001 33.789 0.000 0.022 0.024
length_stay 0.1089 0.009 11.948 0.000 0.091 0.127
no_of_children_log 0.5489 0.093 5.887 0.000 0.366 0.732
no_of_previous_cancellations_log 1.2322 0.490 2.515 0.012 0.272 2.193
no_of_previous_bookings_not_canceled_log -0.6731 0.477 -1.411 0.158 -1.608 0.262
no_of_special_requests_log -1.9180 0.044 -43.892 0.000 -2.004 -1.832
type_of_meal_plan_Meal Plan 2 -0.3479 0.056 -6.163 0.000 -0.459 -0.237
type_of_meal_plan_Meal Plan 3 1.7183 2.912 0.590 0.555 -3.988 7.425
type_of_meal_plan_Not Selected 0.8463 0.048 17.564 0.000 0.752 0.941
room_type_reserved_Room_Type 2 0.1289 0.123 1.046 0.296 -0.113 0.371
room_type_reserved_Room_Type 4 0.0549 0.050 1.097 0.273 -0.043 0.153
room_type_reserved_Room_Type 5 -0.9271 0.196 -4.735 0.000 -1.311 -0.543
room_type_reserved_Room_Type 6 -1.0662 0.135 -7.903 0.000 -1.331 -0.802
room_type_reserved_Room_Type 7 -1.8078 0.286 -6.331 0.000 -2.367 -1.248
lead_time_y_short 1.3166 0.039 34.051 0.000 1.241 1.392
lead_time_y_med 2.8622 0.058 49.315 0.000 2.748 2.976
lead_time_y_long 3.0529 0.077 39.429 0.000 2.901 3.205
lead_time_y_advanced 4.5673 0.247 18.478 0.000 4.083 5.052
============================================================================================================
X_train3 = X_train2.drop(['no_of_previous_bookings_not_canceled_log'], axis=1)
logit = sm.Logit(y_train, X_train3.astype(float))
lg3 = logit.fit()
Optimization terminated successfully.
Current function value: 0.463479
Iterations 9
print(lg3.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25370
Method: MLE Df Model: 21
Date: Sun, 14 Jul 2024 Pseudo R-squ.: 0.2686
Time: 11:03:36 Log-Likelihood: -11769.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
====================================================================================================
coef std err z P>|z| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -3.6853 0.098 -37.686 0.000 -3.877 -3.494
no_of_adults 0.2328 0.035 6.633 0.000 0.164 0.302
required_car_parking_space -1.4531 0.135 -10.738 0.000 -1.718 -1.188
arrival_month -0.0667 0.006 -11.679 0.000 -0.078 -0.056
repeated_guest -2.9666 0.574 -5.169 0.000 -4.092 -1.842
avg_price_per_room 0.0229 0.001 33.812 0.000 0.022 0.024
length_stay 0.1089 0.009 11.954 0.000 0.091 0.127
no_of_children_log 0.5492 0.093 5.890 0.000 0.366 0.732
no_of_previous_cancellations_log 0.9583 0.399 2.401 0.016 0.176 1.741
no_of_special_requests_log -1.9193 0.044 -43.924 0.000 -2.005 -1.834
type_of_meal_plan_Meal Plan 2 -0.3491 0.056 -6.183 0.000 -0.460 -0.238
type_of_meal_plan_Meal Plan 3 1.7180 2.914 0.590 0.555 -3.992 7.429
type_of_meal_plan_Not Selected 0.8466 0.048 17.568 0.000 0.752 0.941
room_type_reserved_Room_Type 2 0.1290 0.123 1.046 0.296 -0.113 0.371
room_type_reserved_Room_Type 4 0.0543 0.050 1.085 0.278 -0.044 0.152
room_type_reserved_Room_Type 5 -0.9291 0.196 -4.746 0.000 -1.313 -0.545
room_type_reserved_Room_Type 6 -1.0676 0.135 -7.913 0.000 -1.332 -0.803
room_type_reserved_Room_Type 7 -1.8098 0.286 -6.337 0.000 -2.370 -1.250
lead_time_y_short 1.3170 0.039 34.060 0.000 1.241 1.393
lead_time_y_med 2.8638 0.058 49.342 0.000 2.750 2.978
lead_time_y_long 3.0543 0.077 39.438 0.000 2.903 3.206
lead_time_y_advanced 4.5896 0.248 18.494 0.000 4.103 5.076
====================================================================================================
# let's check the VIF of the predictors again to see if any Multicollinearity persist
vif_series = pd.Series(
[variance_inflation_factor(X_train3.values, i) for i in range(X_train3.shape[1])],
index=X_train3.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 29.23951 no_of_adults 1.27645 required_car_parking_space 1.03658 arrival_month 1.04468 repeated_guest 1.55246 avg_price_per_room 1.57899 length_stay 1.07664 no_of_children_log 1.85538 no_of_previous_cancellations_log 1.42609 no_of_special_requests_log 1.12828 type_of_meal_plan_Meal Plan 2 1.13429 type_of_meal_plan_Meal Plan 3 1.01858 type_of_meal_plan_Not Selected 1.10868 room_type_reserved_Room_Type 2 1.07948 room_type_reserved_Room_Type 4 1.31640 room_type_reserved_Room_Type 5 1.01261 room_type_reserved_Room_Type 6 1.83272 room_type_reserved_Room_Type 7 1.07151 lead_time_y_short 1.10526 lead_time_y_med 1.09192 lead_time_y_long 1.12291 lead_time_y_advanced 1.04351 dtype: float64
X_train4 = X_train3.drop(['room_type_reserved_Room_Type 2'], axis=1)
logit = sm.Logit(y_train, X_train4.astype(float))
lg4 = logit.fit()
Optimization terminated successfully.
Current function value: 0.463500
Iterations 9
print(lg4.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25371
Method: MLE Df Model: 20
Date: Sun, 14 Jul 2024 Pseudo R-squ.: 0.2686
Time: 11:04:38 Log-Likelihood: -11769.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
====================================================================================================
coef std err z P>|z| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -3.6755 0.097 -37.773 0.000 -3.866 -3.485
no_of_adults 0.2313 0.035 6.592 0.000 0.162 0.300
required_car_parking_space -1.4496 0.135 -10.723 0.000 -1.715 -1.185
arrival_month -0.0669 0.006 -11.722 0.000 -0.078 -0.056
repeated_guest -2.9693 0.574 -5.173 0.000 -4.094 -1.844
avg_price_per_room 0.0229 0.001 33.816 0.000 0.022 0.024
length_stay 0.1090 0.009 11.964 0.000 0.091 0.127
no_of_children_log 0.5688 0.091 6.225 0.000 0.390 0.748
no_of_previous_cancellations_log 0.9574 0.399 2.398 0.016 0.175 1.740
no_of_special_requests_log -1.9178 0.044 -43.917 0.000 -2.003 -1.832
type_of_meal_plan_Meal Plan 2 -0.3510 0.056 -6.220 0.000 -0.462 -0.240
type_of_meal_plan_Meal Plan 3 1.7193 2.911 0.591 0.555 -3.987 7.425
type_of_meal_plan_Not Selected 0.8443 0.048 17.541 0.000 0.750 0.939
room_type_reserved_Room_Type 4 0.0528 0.050 1.056 0.291 -0.045 0.151
room_type_reserved_Room_Type 5 -0.9320 0.196 -4.761 0.000 -1.316 -0.548
room_type_reserved_Room_Type 6 -1.0857 0.134 -8.110 0.000 -1.348 -0.823
room_type_reserved_Room_Type 7 -1.8191 0.286 -6.369 0.000 -2.379 -1.259
lead_time_y_short 1.3172 0.039 34.067 0.000 1.241 1.393
lead_time_y_med 2.8676 0.058 49.483 0.000 2.754 2.981
lead_time_y_long 3.0549 0.077 39.444 0.000 2.903 3.207
lead_time_y_advanced 4.5912 0.248 18.498 0.000 4.105 5.078
====================================================================================================
X_train5 = X_train4.drop(['room_type_reserved_Room_Type 4'], axis=1)
logit = sm.Logit(y_train, X_train5.astype(float))
lg5 = logit.fit()
Optimization terminated successfully.
Current function value: 0.463522
Iterations 9
print(lg5.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25372
Method: MLE Df Model: 19
Date: Sun, 14 Jul 2024 Pseudo R-squ.: 0.2686
Time: 11:05:24 Log-Likelihood: -11770.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
====================================================================================================
coef std err z P>|z| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -3.7035 0.094 -39.515 0.000 -3.887 -3.520
no_of_adults 0.2398 0.034 7.023 0.000 0.173 0.307
required_car_parking_space -1.4500 0.135 -10.728 0.000 -1.715 -1.185
arrival_month -0.0672 0.006 -11.767 0.000 -0.078 -0.056
repeated_guest -2.9647 0.574 -5.163 0.000 -4.090 -1.839
avg_price_per_room 0.0231 0.001 35.959 0.000 0.022 0.024
length_stay 0.1100 0.009 12.154 0.000 0.092 0.128
no_of_children_log 0.5594 0.091 6.150 0.000 0.381 0.738
no_of_previous_cancellations_log 0.9559 0.400 2.392 0.017 0.173 1.739
no_of_special_requests_log -1.9157 0.044 -43.926 0.000 -2.001 -1.830
type_of_meal_plan_Meal Plan 2 -0.3601 0.056 -6.456 0.000 -0.469 -0.251
type_of_meal_plan_Meal Plan 3 1.7283 2.969 0.582 0.560 -4.090 7.547
type_of_meal_plan_Not Selected 0.8335 0.047 17.728 0.000 0.741 0.926
room_type_reserved_Room_Type 5 -0.9481 0.195 -4.858 0.000 -1.331 -0.566
room_type_reserved_Room_Type 6 -1.1095 0.132 -8.409 0.000 -1.368 -0.851
room_type_reserved_Room_Type 7 -1.8578 0.283 -6.556 0.000 -2.413 -1.302
lead_time_y_short 1.3157 0.039 34.057 0.000 1.240 1.391
lead_time_y_med 2.8630 0.058 49.547 0.000 2.750 2.976
lead_time_y_long 3.0495 0.077 39.450 0.000 2.898 3.201
lead_time_y_advanced 4.5867 0.248 18.479 0.000 4.100 5.073
====================================================================================================
Predicting a person booking a room will cancel, but they do not. Predicting a person booking a room will not cancel, but they do. Which case is more important? Both are important:
If we predict a person will cancel and then they do not, then we will reallocate their room to another guest and not have a room available to them upon their arrival, costing the hotel a significant amount of money (by offering them a complimentary upgraded room), likely losing a repeat customer(s), and generating negative review(s) for the hotel.
If we predict a person will not cancel their reservation and then they do, we will lose out on the revenue generated from their reservation, have to incur the costs of remarketing the room, and more than likely rebook the room at a discount.
F1 score should be maximized, the greater the f1_score higher the chances of identifying both the classes correctly.
# converting coefficients to odds
odds = np.exp(lg5.params)
# adding the odds to a dataframe
pd.DataFrame(odds, X_train5.columns, columns=["odds"]).T
| const | no_of_adults | required_car_parking_space | arrival_month | repeated_guest | avg_price_per_room | length_stay | no_of_children_log | no_of_previous_cancellations_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | lead_time_y_short | lead_time_y_med | lead_time_y_long | lead_time_y_advanced | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| odds | 0.02464 | 1.27101 | 0.23458 | 0.93503 | 0.05158 | 1.02334 | 1.11633 | 1.74963 | 2.60091 | 0.14724 | 0.69760 | 5.63133 | 2.30130 | 0.38748 | 0.32973 | 0.15602 | 3.72746 | 17.51483 | 21.10572 | 98.17057 |
# finding the percentage change
perc_change_odds = (np.exp(lg5.params) - 1) * 100
# adding the change_odds% to a dataframe
pd.DataFrame(perc_change_odds, X_train3.columns, columns=["change_odds%"]).T
| const | no_of_adults | required_car_parking_space | arrival_month | repeated_guest | avg_price_per_room | length_stay | no_of_children_log | no_of_previous_cancellations_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | lead_time_y_short | lead_time_y_med | lead_time_y_long | lead_time_y_advanced | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| change_odds% | -97.53635 | 27.10052 | -76.54209 | -6.49682 | -94.84247 | 2.33444 | 11.63310 | 74.96254 | 160.09103 | -85.27628 | -30.24019 | 463.13278 | 130.13023 | NaN | NaN | -61.25159 | -67.02661 | -84.39843 | 272.74612 | 1651.48264 | 2010.57241 | 9717.05650 |
# fitting the model on training set
logit = sm.Logit(y_train, X_train5.astype(float))
lg3 = logit.fit()
pred_train4 = lg5.predict(X_train5)
pred_train4 = np.round(pred_train4)
Optimization terminated successfully.
Current function value: 0.463522
Iterations 9
# another confusion matrix
cm = confusion_matrix(y_train, pred_train4)
plt.figure(figsize=(7, 5))
sns.heatmap(cm, annot=True, fmt="g")
plt.xlabel("Predicted Values")
plt.ylabel("Actual Values")
plt.show()
print("Accuracy on training set : ", accuracy_score(y_train, pred_train4))
Accuracy on training set : 0.7811515437933207
logit_roc_auc_train = roc_auc_score(y_train, lg5.predict(X_train5))
fpr, tpr, thresholds = roc_curve(y_train, lg5.predict(X_train5))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# dropping variables from test set as well which were dropped from training set
X_test1 = X_test.drop([ 'no_of_weekend_nights_log',
'no_of_week_nights',
'market_segment_type_Online',
'market_segment_type_Offline',
'market_segment_type_Corporate',
'market_segment_type_Complementary',
'room_type_reserved_Room_Type 3',
'room_type_reserved_Room_Type 4',
'no_of_previous_bookings_not_canceled_log',
'room_type_reserved_Room_Type 2'
], axis=1)
pred_test = lg5.predict(X_test1) > 0.5
pred_test = np.round(pred_test)
print("Accuracy on training set : ", accuracy_score(y_train, pred_train4))
print("Accuracy on test set : ", accuracy_score(y_test, pred_test))
Accuracy on training set : 0.7811515437933207 Accuracy on test set : 0.7846182118901038
tree_data = dummy_data.astype(float)
tree_data = tree_data.drop(['arrival_date','arrival_year','no_of_week_nights',
'no_of_weekend_nights_log' ], axis=1)
X = tree_data.drop("booking_status" , axis=1)
y = tree_data.pop("booking_status")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.30, random_state=1)
# Building a decision tree using the dtclassifier function
dTree = DecisionTreeClassifier(criterion = 'gini', random_state=1)
dTree.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(random_state=1)
Yes, using a simplified data set for the tree
#scoring the accuracy on train & test data
print("Accuracy on training set : ",dTree.score(X_train, y_train))
print("Accuracy on test set : ",dTree.score(X_test, y_test))
Accuracy on training set : 0.9924385633270322 Accuracy on test set : 0.8585867867315997
# Checking the positive outcomes
y.sum(axis = 0)
11885.0
## Function to create confusion matrix
def make_confusion_matrix(model,y_actual,labels=[1, 0]):
'''
model : classifier to predict values of X
y_actual : ground truth
'''
y_predict = model.predict(X_test)
cm=metrics.confusion_matrix( y_actual, y_predict, labels=[0, 1])
df_cm = pd.DataFrame(cm, index = [i for i in ["Actual - No","Actual - Yes"]],
columns = [i for i in ['Predicted - No','Predicted - Yes']])
group_counts = ["{0:0.0f}".format(value) for value in
cm.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
cm.flatten()/np.sum(cm)]
labels = [f"{v1}\n{v2}" for v1, v2 in
zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=labels,fmt='')
plt.ylabel('True label')
plt.xlabel('Predicted label')
the_features = list(X.columns)
print(the_features)
['no_of_adults', 'required_car_parking_space', 'lead_time', 'arrival_month', 'repeated_guest', 'avg_price_per_room', 'length_stay', 'no_of_children_log', 'no_of_previous_cancellations_log', 'no_of_previous_bookings_not_canceled_log', 'no_of_special_requests_log', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
plt.figure(figsize=(20,30))
tree.plot_tree(dTree,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
print(tree.export_text(dTree,feature_names=the_features,show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests_log <= 0.35 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- length_stay <= 5.50 | | | | | |--- avg_price_per_room <= 201.50 | | | | | | |--- lead_time <= 74.50 | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 62.00 | | | | | | | | | | |--- avg_price_per_room <= 57.50 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 57.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 62.00 | | | | | | | | | | |--- avg_price_per_room <= 151.59 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 151.59 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- avg_price_per_room <= 138.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 138.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 5.50 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | | |--- weights: [169.00, 0.00] class: 0.0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- avg_price_per_room <= 50.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 50.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | |--- lead_time > 74.50 | | | | | | | |--- lead_time <= 78.50 | | | | | | | | |--- avg_price_per_room <= 95.47 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 69.85 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 69.85 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1.0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- weights: [26.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 95.47 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 120.24 | | | | | | | | | | | |--- weights: [0.00, 30.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 120.24 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 78.50 | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [110.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- lead_time <= 86.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 86.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | |--- avg_price_per_room <= 66.75 | | | | | | | | | | |--- avg_price_per_room <= 63.25 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 63.25 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 66.75 | | | | | | | | | | |--- avg_price_per_room <= 73.53 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 73.53 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | |--- avg_price_per_room > 201.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- weights: [0.00, 17.00] class: 1.0 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | |--- length_stay > 5.50 | | | | | |--- avg_price_per_room <= 115.50 | | | | | | |--- length_stay <= 14.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- length_stay <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- length_stay > 11.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- lead_time <= 75.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 75.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- avg_price_per_room <= 70.42 | | | | | | | | | |--- weights: [34.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 70.42 | | | | | | | | | |--- avg_price_per_room <= 71.42 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 71.42 | | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- length_stay > 14.50 | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | |--- avg_price_per_room > 115.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- length_stay <= 10.00 | | | | | | | | | |--- weights: [0.00, 43.00] class: 1.0 | | | | | | | | |--- length_stay > 10.00 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- lead_time <= 104.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 104.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- avg_price_per_room <= 61.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 61.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 80.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 80.50 | | | | | | | | | | | |--- weights: [23.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- weights: [50.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 86.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 86.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- lead_time <= 112.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 112.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- length_stay <= 3.50 | | | | | | | | |--- avg_price_per_room <= 117.50 | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | |--- weights: [0.00, 59.00] class: 1.0 | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 117.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 3.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 101.12 | | | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 101.12 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- weights: [0.00, 47.00] class: 1.0 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | |--- lead_time <= 104.00 | | | | | | | | | |--- avg_price_per_room <= 177.83 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | | |--- weights: [45.00, 0.00] class: 0.0 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 177.83 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- lead_time > 104.00 | | | | | | | | | |--- avg_price_per_room <= 110.86 | | | | | | | | | | |--- weights: [0.00, 12.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 110.86 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_adults <= 1.50 | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | |--- weights: [141.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- no_of_adults > 1.50 | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- lead_time <= 123.50 | | | | | | | | | | |--- avg_price_per_room <= 82.88 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 82.88 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 123.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- lead_time <= 122.00 | | | | | | | | | | |--- avg_price_per_room <= 63.12 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 63.12 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | | | | |--- lead_time > 122.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [65.00, 0.00] class: 0.0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | |--- avg_price_per_room <= 94.75 | | | | | | | | | |--- lead_time <= 125.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 125.50 | | | | | | | | | | |--- lead_time <= 138.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 138.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 94.75 | | | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | | | |--- arrival_month > 7.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | | |--- avg_price_per_room <= 97.41 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 97.41 | | | | | | | | | | | |--- weights: [60.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 202.67 | | | | | |--- lead_time <= 3.50 | | | | | | |--- arrival_month <= 5.50 | | | | | | | |--- length_stay <= 6.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [67.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | |--- length_stay > 6.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | |--- arrival_month > 5.50 | | | | | | | |--- length_stay <= 12.00 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 76.35 | | | | | | | | | | |--- avg_price_per_room <= 74.40 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 74.40 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 76.35 | | | | | | | | | | |--- avg_price_per_room <= 118.04 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 118.04 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- avg_price_per_room <= 178.00 | | | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 178.00 | | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 12.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 3.50 | | | | | | |--- avg_price_per_room <= 99.38 | | | | | | | |--- avg_price_per_room <= 78.90 | | | | | | | | |--- length_stay <= 15.00 | | | | | | | | | |--- length_stay <= 7.50 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [84.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 7.50 | | | | | | | | | | |--- lead_time <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 7.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 15.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 78.90 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [23.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1.0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [42.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 99.38 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 119.25 | | | | | | | | | | |--- avg_price_per_room <= 117.25 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 117.25 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 119.25 | | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | |--- avg_price_per_room <= 160.17 | | | | | | | | | | | |--- weights: [41.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 160.17 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 10.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 202.67 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 32.00] class: 1.0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | |--- lead_time > 13.50 | | | | |--- avg_price_per_room <= 105.27 | | | | | |--- avg_price_per_room <= 60.07 | | | | | | |--- lead_time <= 84.50 | | | | | | | |--- lead_time <= 51.50 | | | | | | | | |--- lead_time <= 50.50 | | | | | | | | | |--- avg_price_per_room <= 21.67 | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 21.67 | | | | | | | | | | |--- avg_price_per_room <= 49.84 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 49.84 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- lead_time > 50.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- lead_time > 51.50 | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 84.50 | | | | | | | |--- lead_time <= 87.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | |--- lead_time > 87.50 | | | | | | | | |--- length_stay <= 8.00 | | | | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- length_stay > 8.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- avg_price_per_room > 60.07 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [29.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- avg_price_per_room <= 69.16 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 69.16 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [54.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 71.92 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 71.92 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- weights: [0.00, 35.00] class: 1.0 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 90.20 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 90.20 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 74.53 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- avg_price_per_room > 74.53 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 105.27 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- avg_price_per_room <= 195.30 | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- lead_time <= 38.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- lead_time > 38.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- lead_time <= 135.50 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | | | | |--- lead_time > 135.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | |--- avg_price_per_room > 195.30 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [0.00, 92.00] class: 1.0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- lead_time <= 22.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [22.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 22.50 | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | |--- avg_price_per_room <= 147.75 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 147.75 | | | | | | | | | | |--- weights: [0.00, 15.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | |--- length_stay <= 8.50 | | | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- length_stay > 8.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- length_stay <= 11.00 | | | | | | | |--- weights: [39.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 11.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | |--- no_of_special_requests_log > 0.35 | | |--- no_of_special_requests_log <= 0.90 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | |--- lead_time <= 102.50 | | | | | | |--- length_stay <= 15.00 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- lead_time <= 91.50 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [848.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 91.50 | | | | | | | | | |--- no_of_children_log <= 0.35 | | | | | | | | | | |--- weights: [43.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_children_log > 0.35 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | |--- lead_time <= 35.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- lead_time > 35.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 15.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 102.50 | | | | | | |--- lead_time <= 104.50 | | | | | | | |--- lead_time <= 103.50 | | | | | | | | |--- no_of_children_log <= 0.35 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_children_log > 0.35 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- lead_time > 103.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | |--- lead_time > 104.50 | | | | | | | |--- avg_price_per_room <= 141.75 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 81.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 81.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 141.75 | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | |--- lead_time <= 63.00 | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | |--- weights: [18.00, 0.00] class: 0.0 | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | |--- length_stay <= 1.50 | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | |--- length_stay > 1.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- lead_time > 63.00 | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- length_stay <= 14.00 | | | | | | | |--- avg_price_per_room <= 219.86 | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | |--- avg_price_per_room <= 157.64 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 157.64 | | | | | | | | | | |--- avg_price_per_room <= 158.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 158.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 219.86 | | | | | | | | |--- arrival_month <= 6.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- arrival_month > 6.00 | | | | | | | | | |--- avg_price_per_room <= 237.25 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 237.25 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- length_stay > 14.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- lead_time > 4.50 | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | |--- avg_price_per_room <= 123.60 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 88.76 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.76 | | | | | | | | | | |--- avg_price_per_room <= 91.22 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 91.22 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- weights: [95.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 123.60 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 124.05 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 124.05 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 128.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 128.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | |--- length_stay <= 3.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 3.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 127.62 | | | | | | | |--- lead_time <= 43.50 | | | | | | | | |--- length_stay <= 9.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [87.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [127.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 9.50 | | | | | | | | | |--- lead_time <= 29.50 | | | | | | | | | | |--- avg_price_per_room <= 76.22 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 76.22 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 29.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | |--- lead_time > 43.50 | | | | | | | | |--- length_stay <= 10.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 76.54 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- avg_price_per_room > 76.54 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- length_stay > 10.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | |--- avg_price_per_room > 127.62 | | | | | | | |--- lead_time <= 142.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 179.62 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | |--- avg_price_per_room > 179.62 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 139.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time > 139.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- weights: [49.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- lead_time > 142.50 | | | | | | | | |--- avg_price_per_room <= 142.65 | | | | | | | | | |--- arrival_month <= 10.00 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 10.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 142.65 | | | | | | | | | |--- avg_price_per_room <= 182.49 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 182.49 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | |--- weights: [180.00, 0.00] class: 0.0 | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | |--- no_of_special_requests_log > 0.90 | | | |--- lead_time <= 90.50 | | | | |--- length_stay <= 12.00 | | | | | |--- length_stay <= 4.50 | | | | | | |--- length_stay <= 3.50 | | | | | | | |--- weights: [1689.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 3.50 | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 90.05 | | | | | | | | | |--- lead_time <= 48.00 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- weights: [61.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 48.00 | | | | | | | | | | |--- avg_price_per_room <= 89.85 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 89.85 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 90.05 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | | | |--- weights: [221.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- lead_time <= 28.50 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 28.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | |--- lead_time <= 31.00 | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 31.00 | | | | | | | | | |--- avg_price_per_room <= 159.42 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 159.42 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- length_stay > 4.50 | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | |--- length_stay <= 6.50 | | | | | | | | |--- avg_price_per_room <= 92.33 | | | | | | | | | |--- avg_price_per_room <= 90.95 | | | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 90.95 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 92.33 | | | | | | | | | |--- lead_time <= 80.50 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 80.50 | | | | | | | | | | |--- lead_time <= 81.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 81.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 6.50 | | | | | | | | |--- lead_time <= 9.00 | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 9.00 | | | | | | | | | |--- lead_time <= 34.50 | | | | | | | | | | |--- avg_price_per_room <= 83.24 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 83.24 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- lead_time > 34.50 | | | | | | | | | | |--- lead_time <= 72.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 72.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | |--- weights: [69.00, 0.00] class: 0.0 | | | | |--- length_stay > 12.00 | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | |--- lead_time > 90.50 | | | | |--- avg_price_per_room <= 202.95 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- lead_time <= 150.50 | | | | | | | |--- length_stay <= 5.50 | | | | | | | | |--- avg_price_per_room <= 80.33 | | | | | | | | | |--- avg_price_per_room <= 76.37 | | | | | | | | | | |--- weights: [22.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 76.37 | | | | | | | | | | |--- lead_time <= 98.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 98.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- avg_price_per_room > 80.33 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 115.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 115.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- length_stay > 5.50 | | | | | | | | |--- no_of_children_log <= 0.35 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- lead_time <= 142.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 142.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_children_log > 0.35 | | | | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | | | | |--- lead_time <= 105.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 105.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 150.50 | | | | | | | |--- avg_price_per_room <= 103.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 103.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | |--- arrival_month > 8.50 | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- lead_time <= 107.00 | | | | | | | | | | |--- avg_price_per_room <= 70.52 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 70.52 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 107.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | | |--- lead_time <= 104.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 104.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | | | | |--- avg_price_per_room <= 92.60 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 92.60 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | |--- weights: [52.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 202.95 | | | | | |--- weights: [0.00, 7.00] class: 1.0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests_log <= 0.35 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- no_of_adults <= 1.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- length_stay <= 3.50 | | | | | | | |--- length_stay <= 2.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 2.50 | | | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | | | | | | |--- length_stay > 3.50 | | | | | | | |--- weights: [0.00, 15.00] class: 1.0 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- avg_price_per_room <= 97.50 | | | | | | | | | |--- length_stay <= 3.00 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1.0 | | | | | | | | | |--- length_stay > 3.00 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 97.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- weights: [61.00, 6.00] class: 0.0 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- avg_price_per_room <= 88.00 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.00 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | | |--- avg_price_per_room <= 55.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 55.21 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | | |--- lead_time <= 231.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 231.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- length_stay <= 5.50 | | | | | | | | |--- lead_time <= 402.00 | | | | | | | | | |--- avg_price_per_room <= 80.00 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 80.00 | | | | | | | | | | |--- lead_time <= 381.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 381.50 | | | | | | | | | | | |--- weights: [3.00, 2.00] class: 0.0 | | | | | | | | |--- lead_time > 402.00 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | |--- length_stay > 5.50 | | | | | | | | |--- avg_price_per_room <= 88.33 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 88.33 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | | | | |--- no_of_adults > 1.50 | | | | | |--- avg_price_per_room <= 84.58 | | | | | | |--- lead_time <= 244.00 | | | | | | | |--- length_stay <= 2.50 | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | |--- lead_time <= 229.50 | | | | | | | | | | |--- avg_price_per_room <= 69.34 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 69.34 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 229.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 2.50 | | | | | | | | |--- avg_price_per_room <= 27.07 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 27.07 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- lead_time > 244.00 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- avg_price_per_room <= 75.83 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 66.00 | | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 66.00 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- length_stay <= 6.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- length_stay > 6.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 75.83 | | | | | | | | | |--- lead_time <= 292.50 | | | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 292.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- weights: [0.00, 23.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [37.00, 0.00] class: 0.0 | | | | | |--- avg_price_per_room > 84.58 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- lead_time <= 316.00 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 316.00 | | | | | | | | | |--- lead_time <= 338.00 | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 338.00 | | | | | | | | | | |--- weights: [1.00, 5.00] class: 1.0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- avg_price_per_room <= 2.50 | | | | | |--- no_of_adults <= 1.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- no_of_adults > 1.50 | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | |--- avg_price_per_room > 2.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 525.00] class: 1.0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- length_stay <= 3.50 | | | | | | | |--- lead_time <= 204.00 | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | |--- lead_time > 204.00 | | | | | | | | |--- lead_time <= 214.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 214.50 | | | | | | | | | |--- lead_time <= 275.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | | | | |--- lead_time > 275.50 | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | |--- length_stay > 3.50 | | | | | | | |--- avg_price_per_room <= 80.51 | | | | | | | | |--- weights: [0.00, 41.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 80.51 | | | | | | | | |--- avg_price_per_room <= 81.43 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 81.43 | | | | | | | | | |--- weights: [0.00, 13.00] class: 1.0 | | |--- no_of_special_requests_log > 0.35 | | | |--- market_segment_type_Offline <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- lead_time <= 152.50 | | | | | | | | |--- avg_price_per_room <= 90.81 | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 2.00] class: 1.0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 90.81 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 152.50 | | | | | | | | |--- lead_time <= 156.50 | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 156.50 | | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- avg_price_per_room <= 87.12 | | | | | | | | |--- lead_time <= 158.50 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | |--- lead_time > 158.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 87.12 | | | | | | | | |--- avg_price_per_room <= 89.75 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 89.75 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- lead_time > 159.50 | | | | | | |--- no_of_adults <= 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- no_of_adults > 0.50 | | | | | | | |--- avg_price_per_room <= 93.44 | | | | | | | | |--- length_stay <= 5.50 | | | | | | | | | |--- lead_time <= 162.50 | | | | | | | | | | |--- lead_time <= 161.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 161.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- lead_time > 162.50 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [52.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 5.50 | | | | | | | | | |--- avg_price_per_room <= 88.38 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.38 | | | | | | | | | | |--- avg_price_per_room <= 90.92 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 90.92 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 93.44 | | | | | | | | |--- lead_time <= 178.50 | | | | | | | | | |--- avg_price_per_room <= 93.67 | | | | | | | | | | |--- length_stay <= 5.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 5.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 93.67 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- lead_time > 178.50 | | | | | | | | | |--- lead_time <= 179.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | | |--- lead_time > 179.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [4.00, 1.00] class: 0.0 | | | | |--- lead_time > 180.50 | | | | | |--- length_stay <= 3.50 | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | |--- lead_time <= 187.50 | | | | | | | | |--- arrival_month <= 4.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 4.00 | | | | | | | | | |--- avg_price_per_room <= 78.30 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 78.30 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [0.00, 20.00] class: 1.0 | | | | | | | |--- lead_time > 187.50 | | | | | | | | |--- lead_time <= 304.50 | | | | | | | | | |--- avg_price_per_room <= 78.90 | | | | | | | | | | |--- lead_time <= 237.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 237.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 78.90 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- lead_time > 304.50 | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | |--- weights: [0.00, 17.00] class: 1.0 | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | |--- length_stay > 3.50 | | | | | | |--- length_stay <= 13.50 | | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | | |--- avg_price_per_room <= 68.32 | | | | | | | | | |--- arrival_month <= 11.00 | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 11.00 | | | | | | | | | | |--- lead_time <= 247.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 247.00 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 68.32 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- avg_price_per_room <= 81.12 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 81.12 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- avg_price_per_room <= 70.89 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 70.89 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | | |--- weights: [17.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 13.50 | | | | | | | |--- weights: [0.00, 5.00] class: 1.0 | | | |--- market_segment_type_Offline > 0.50 | | | | |--- lead_time <= 368.00 | | | | | |--- lead_time <= 348.50 | | | | | | |--- no_of_adults <= 2.50 | | | | | | | |--- length_stay <= 7.50 | | | | | | | | |--- lead_time <= 331.00 | | | | | | | | | |--- no_of_special_requests_log <= 0.90 | | | | | | | | | | |--- weights: [137.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_special_requests_log > 0.90 | | | | | | | | | | |--- length_stay <= 5.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 5.50 | | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | | |--- lead_time > 331.00 | | | | | | | | | |--- lead_time <= 336.50 | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | | | |--- lead_time > 336.50 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 7.50 | | | | | | | | |--- avg_price_per_room <= 80.74 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 80.74 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- no_of_adults > 2.50 | | | | | | | |--- lead_time <= 196.00 | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 196.00 | | | | | | | | |--- no_of_special_requests_log <= 0.90 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | |--- no_of_special_requests_log > 0.90 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- lead_time > 348.50 | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | |--- weights: [6.00, 2.00] class: 0.0 | | | | |--- lead_time > 368.00 | | | | | |--- lead_time <= 381.50 | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 381.50 | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests_log <= 1.24 | | | | |--- weights: [0.00, 2108.00] class: 1.0 | | | |--- no_of_special_requests_log > 1.24 | | | | |--- weights: [31.00, 0.00] class: 0.0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests_log <= 0.35 | | | | |--- weights: [47.00, 0.00] class: 0.0 | | | |--- no_of_special_requests_log > 0.35 | | | | |--- lead_time <= 289.50 | | | | | |--- no_of_special_requests_log <= 0.90 | | | | | | |--- avg_price_per_room <= 114.59 | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 114.59 | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | |--- no_of_special_requests_log > 0.90 | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | |--- avg_price_per_room <= 110.46 | | | | | | | | |--- lead_time <= 206.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 206.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 110.46 | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | |--- lead_time > 289.50 | | | | | |--- weights: [0.00, 7.00] class: 1.0
# checking out what variables are being prioritized by the model.
print (pd.DataFrame(dTree.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
Imp lead_time 0.39708 avg_price_per_room 0.20728 market_segment_type_Online 0.09275 arrival_month 0.08443 length_stay 0.07326 no_of_special_requests_log 0.06831 no_of_adults 0.02970 type_of_meal_plan_Not Selected 0.01111 room_type_reserved_Room_Type 4 0.00820 required_car_parking_space 0.00738 no_of_children_log 0.00590 type_of_meal_plan_Meal Plan 2 0.00456 market_segment_type_Offline 0.00352 room_type_reserved_Room_Type 2 0.00224 room_type_reserved_Room_Type 5 0.00171 room_type_reserved_Room_Type 6 0.00075 market_segment_type_Corporate 0.00069 repeated_guest 0.00047 room_type_reserved_Room_Type 7 0.00034 no_of_previous_cancellations_log 0.00032 room_type_reserved_Room_Type 3 0.00000 market_segment_type_Complementary 0.00000 no_of_previous_bookings_not_canceled_log 0.00000 type_of_meal_plan_Meal Plan 3 0.00000
importances = dTree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
# Pre prune the model with max depth hyperparameter
dTree1 = DecisionTreeClassifier(criterion = 'gini',max_depth=3,random_state=1)
dTree1.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=3, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(max_depth=3, random_state=1)
# The accuracy on the pre pruned tree.
print("Accuracy on training set : ",dTree1.score(X_train, y_train))
print("Accuracy on test set : ",dTree1.score(X_test, y_test))
Accuracy on training set : 0.7844202898550725 Accuracy on test set : 0.7913259211614444
# Let's see the pre pruned tree
plt.figure(figsize=(15,10))
tree.plot_tree(dTree1,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
print(tree.export_text(dTree1,feature_names=the_features,show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests_log <= 0.35 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [4614.00, 781.00] class: 0.0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [2504.00, 2768.00] class: 1.0 | |--- no_of_special_requests_log > 0.35 | | |--- no_of_special_requests_log <= 0.90 | | | |--- weights: [5624.00, 1055.00] class: 0.0 | | |--- no_of_special_requests_log > 0.90 | | | |--- weights: [2919.00, 145.00] class: 0.0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests_log <= 0.35 | | | |--- weights: [694.00, 1242.00] class: 1.0 | | |--- no_of_special_requests_log > 0.35 | | | |--- weights: [586.00, 249.00] class: 0.0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- weights: [31.00, 2108.00] class: 1.0 | | |--- arrival_month > 11.50 | | | |--- weights: [57.00, 15.00] class: 0.0
# Looking at the feature importances of this model
importances = dTree1.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(10,10))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
clf = DecisionTreeClassifier(random_state=1, class_weight="balanced")
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = abs(path.ccp_alphas), path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.00000 | 0.01030 |
| 1 | 0.00000 | 0.01030 |
| 2 | 0.00000 | 0.01030 |
| 3 | 0.00000 | 0.01030 |
| 4 | 0.00000 | 0.01030 |
| ... | ... | ... |
| 2055 | 0.00890 | 0.32806 |
| 2056 | 0.00980 | 0.33786 |
| 2057 | 0.01272 | 0.35058 |
| 2058 | 0.03412 | 0.41882 |
| 2059 | 0.08118 | 0.50000 |
2060 rows × 2 columns
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha, class_weight="balanced"
)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.08117914389136888
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
F1 Score vs alpha for training and testing sets
f1_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = f1_score(y_train, pred_train)
f1_train.append(values_train)
f1_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = f1_score(y_test, pred_test)
f1_test.append(values_test)
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("F1 Score")
ax.set_title("F1 Score vs alpha for training and testing sets")
ax.plot(ccp_alphas, f1_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, f1_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
index_best_model = np.argmax(f1_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=6.68270542106583e-05, class_weight='balanced',
random_state=1)
comparison_frame = pd.DataFrame({'Model':['Initial decision tree model','Decision tree with restricted maximum depth','Decision treee with hyperparameter tuning',
'Decision tree with post-pruning'], 'Train_Recall':[.981,.732,.732,.979], 'Test_Recall':[.792,.739,.739,.794]})
comparison_frame
| Model | Train_Recall | Test_Recall | |
|---|---|---|---|
| 0 | Initial decision tree model | 0.98100 | 0.79200 |
| 1 | Decision tree with restricted maximum depth | 0.73200 | 0.73900 |
| 2 | Decision treee with hyperparameter tuning | 0.73200 | 0.73900 |
| 3 | Decision tree with post-pruning | 0.97900 | 0.79400 |
The trees with restricted maximum tuning and hyperparameter tuning performed the best while reducing overfitting. I would submit one those the model to the client.
The decision tree default was prone to overfitting, hence wy the F1 score in the training set is greater than the test set, and pre pruning fixed that overfitting. The post pruning method decision tree ended up with the highest F1 score which makes it the best choice
The three most important variables in terms of cancellations were the lead time, meaning how far in advance they booked the room(s), special request for the stay, and average price of the room. Rooms booked in advance of 151 days (5 months) or less were much less likely to cancel the reservation. Those who made a special request on top of that were very unlikely to cancel. This I believe is an opportunity. Rooms booked over 151 days were more likely to cancel. Price was the determining factor for those cancellations. As the likelihood of a cancelation was increased if the room was priced over 100.04 Euros. Leading me to believe that booked early and then subsequently found a better deal.
My Recommendations